Skip to content

fix: prevent KeyError in _FunctionToolBatchExecutor under eager_task_factory#2733

Closed
guoyangzhen wants to merge 1 commit into
openai:mainfrom
guoyangzhen:fix/eager-task-factory-keyerror
Closed

fix: prevent KeyError in _FunctionToolBatchExecutor under eager_task_factory#2733
guoyangzhen wants to merge 1 commit into
openai:mainfrom
guoyangzhen:fix/eager-task-factory-keyerror

Conversation

@guoyangzhen
Copy link
Copy Markdown

Problem

After upgrading from openai-agents==0.9.3 to 0.12.5, streamed runs with parallel tool calls fail with:

KeyError: <Task finished name='None' exception=KeyError(...)>

This only reproduces under asyncio.eager_task_factory (used by Textual TUI framework on Python 3.12+).

Root Cause

In _FunctionToolBatchExecutor._create_tool_task():

task = asyncio.create_task(self._run_single_tool(...))  # ← task starts running here
self.task_states[task] = task_state                      # ← registered AFTER

Under eager_task_factory, create_task() starts executing the coroutine immediately before returning. The task can complete and be processed by _partition_pending_tasksself.task_states[task] before self.task_states[task] = task_state executes, causing KeyError.

The issue reporter confirmed this with a minimal reproduction showing the scheduling problem on Python 3.13.

Fix

Wrap _run_single_tool in an inner async def so that eager_task_factory runs the wrapper's synchronous preamble (which is empty) and yields at the first await, returning control to _create_tool_task to finish task_states registration.

This is preferred over "register before create_task" because the task object (needed as the dict key) is only available from create_task()'s return value.

Testing

The fix ensures that when eager_task_factory eagerly executes the wrapper:

  1. The wrapper's synchronous body runs (nothing happens — it's empty)
  2. The wrapper hits await self._run_single_tool(...) and suspends
  3. Control returns to _create_tool_task
  4. self.task_states[task] = task_state executes — state is now registered
  5. When the event loop resumes the wrapper, _run_single_tool runs safely

…factory (openai#2729)

Under asyncio.eager_task_factory (used by Textual on Python 3.12+),
create_task() starts executing the coroutine before returning. This
creates a race where the task can complete and be processed by
_partition_pending_tasks before self.task_states[task] is registered,
causing KeyError.

Fix: wrap _run_single_tool in an inner async function so eager
execution runs the empty synchronous preamble and yields at the first
await, returning control to _create_tool_task to finish registration.

The wrapper approach is preferred over registering before create_task
because we need the task object as the dict key.
@github-actions github-actions Bot added bug Something isn't working feature:core labels Mar 20, 2026
@seratch
Copy link
Copy Markdown
Member

seratch commented Mar 20, 2026

@guoyangzhen thanks for sharing this. I believe my PR #2731 should already resolve the issue, but do you still see some patterns that are not covered by my change? If so, can you add unit tests covering the patterns as well?

@seratch seratch added the needs-more-info Waiting for a reply/more info from the author label Mar 20, 2026
@seratch seratch marked this pull request as draft March 20, 2026 04:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working feature:core needs-more-info Waiting for a reply/more info from the author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants